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Abstract 

We study the natural question of constructing pseudorandom generators (PRGs) for low- 
degree polynomial threshold functions (PTFs). We give a PRG with seed- length \ogn/ e°^ d ' 
fooling degree d PTFs with error at most e. Previously, no nontrivial constructions were known 
even for quadratic threshold functions and constant error e. For the class of degree 1 threshold 
functions or halfspaces, we construct PRGs with much better dependence on the error parame- 
ter e and obtain a PRG with seed- length 0(log?i + log 2 (l/e)). Previously, only PRGs with seed 
length 0(log n log 2 (1/e) /e 2 ) were known for halfspaces. We also obtain PRGs with similar seed 
lengths for fooling halfspaces over the n-dimcnsional unit sphere. 

The main theme of our constructions and analysis is the use of invariancc principles to con- 
struct pseudorandom generators. We also introduce the notion of monotone read-once branching 
programs, which is key to improving the dependence on the error rate e for halfspaces. These 
techniques may be of independent interest. 
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1 Introduction 



Polynomial threshold functions are a fundamental class of functions with many important applica- 
tions in complexity theory [Bei93], learning theory [KS04], quantum complexity theory [BBC + 01], 
voting theory [ABFR94] and more. A polynomial threshold function (PTF) of degree d is a function 
/ : {1, — l} n — > {1, —1} of the form f(x) = sign(P(x) — 9), where P : {1, — l} n — > R is a multi-linear 
polynomial of degree d. Of particular importance are the class of degree 1 threshold functions, also 
known as halfspaces which have been instrumental in the development of many fundamental tools 
in learning theory such as perceptrons, support vector machines and boosting. 

Here we address the natural problem of explicitly constructing pseudorandom generators (PRGs) 
for PTFs. Derandomizing natural complexity classes is a fundamental problem in complexity the- 
ory, with several applications outside complexity theory. For instance, PRGs for PTFs facilitate 
estimating the accuracy of PTF classifiers in machine learning with a small number of deterministic 
samples; PRGs for spherical caps and PRGs for intersections of halfspaces can help derandomize 
randomized algorithms such as the Goemans- Williamson Max-Cut algorithm. 

In this work, we give the first nontrivial pseudorandom generators for low-degree PTFs. 

Definition 1.1. A function G : {0, l} r — > {1, — l} n is a PRG with error e for (or e-fools) PTFs of 
degree d, if 

| E [/(*)]- E [f(G(y))] | < e, 
aie«{i,-i} n j/e u {o,i} r 

for all PTFs f of degree at most d. (Here x € M S denotes a uniformly random element of S.) 

It can be shown by the probabilistic method that there exist PRGs that e-fool degree d PTFs 
with seed length r = 0(dlogn + log(l/e)) (see Appendix A). However, despite their long history, 
until recently very little was known about explicitly constructing such PRGs, even for the special 
class of halfspaces. 

In this work, we present a PRG that e-fools degree d PTFs with seed length \ogn/ e°^ d \ Pre- 
viously, PRGs with seed length o(n) were not known even for degree 2 PTFs and constant e. 

Theorem 1.2. For < e < 1, there exists an explicit PRG fooling PTFs of degree d with error at 
most e and seed length 2°^ logn/e M+3 . 

Independent of our work, Diakonikolas et al. [DKN10] showed that bounded independence fools 
degree 2 PTFs and in particular give a PRG with seed-length (logra) • 0(l/e 9 ) for degree 2 PTFs 
(here O hides poly-logarithmic factors). In another independent work, Ben-Eliezer et al. [BELY09] 
showed that bounded independence fools certain special classes of PTFs. 

For the d = 1 case of halfspaces, Diakonikolas et al. [DGJ + 09] constructed PRGs with seed 
length O(logra) for constant error rates. PRGs with seed length 0(log 2 n) for halfspaces with 
polynomially bounded weights follow easily from known results. However, nothing nontrivial was 
known for general halfspaces, for instance, when e = \j\fn. In this work we construct PRGs with 
exponentially better dependence on the error parameter e. 

Theorem 1.3. For all constants c, e > l/n c , there exists an explicit PRG fooling halfspaces with 
error at most e and seed length 0(logn + log 2 (l/e)). 

We also obtain results similar to the above for spherical caps. The problem of constructing 
PRGs for spherical caps was brought to our attention by Amir Shpilka; Karnin et al. [KRS09] were 
the first to obtain a PRG with similar parameters using different methods. 
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Theorem 1.4. There exists a constant c > such that for all e > clogn/n 1 / 4 , there exists an 
explicit PRG fooling spherical caps with error at most e and seed length 0(logn + log 2 (l/e)). 

We briefly summarize the previous constructions for halfspaces. 

1. Halfspaces with polynomially bounded integer weights can be computed by polynomial width 
read-once branching programs (ROBPs). Thus, the PRGs for ROBPs such as those of Nisan 
[Nis92] and Impagliazzo et al. [INW94] fool halfspaces with polynomially bounded integer 
weights with seed length (3(log 2 n). However, a simple counting argument ([MT94], [Has94]) 
shows that almost all halfspaces have exponentially large weights. 

2. Diakonikolas et al. [DGJ + 09] showed that fc-wise independent spaces fool halfspaces for k = 
0(log 2 (l/e)/e 2 ). By using the known efficient constructions of fc-wise independent spaces 
they obtain PRGs for halfspaces with seed length 0(lognlog 2 (l/e)/e 2 ). 

3. Rabani and Shpilka [RS09] gave explicit constructions of polynomial size hitting sets for 
halfspaces. 

The overarching theme behind all our constructions is the use of invariance principles to get 
pseudorandom generators. Broadly speaking, invariance principles for a class of functions say that 
under mild conditions (typically on the first few moments) the distribution of the functions is 
essentially invariant for all product distributions. Intuitively, invariance principles could be helpful 
in constructing pseudorandom generators as we can hope to exploit the invariance with respect 
to product distributions by replacing a product distribution with a "smaller product distribution" 
that still satisfies the conditions for applying the invariance principle. We believe that the above 
technique could be helpful for other derandomization problems. 

Another aspect of our constructions is what we call the "monotone trick" . The PRGs for small- 
width read-once branching programs (ROBP) from the works of Nisan [Nis92], Impagliazzo et 
al. [INW94], Nisan and Zuckerman [NZ96], have been a fundamental tool in derandomization with 
several applications [Siv02], [RV05], [GR09]. An important ingredient in our PRG for halfspaces is 
our observation that any PRG for small-width ROBPs fools arbitrary width " monotone" ROBPs. 
Roughly speaking, we say a ROBP is monotone if there exists an ordering on the nodes in each 
layer of the program so that the corresponding sets of accepting strings respect the ordering (see 
Definition 2.3). We believe that this notion of monotone ROBP is quite natural and combined with 
the "monotone trick" could be useful elsewhere. 

The above techniques have recently found other applications that we briefly describe in Sec- 
tion 1.2. We now give a high level view of our constructions and their analysis. 

1.1 Outline of Constructions 

Our constructions build mainly on the hitting set construction for halfspaces of Rabani and Shpilka. 
Although the constructions and analysis are similar in spirit for halfspaces and higher degree PTFs, 
for clarity, we deal with the two classes separately, at the cost of some repetition. The analysis is 
simpler for halfspaces and provides intuition for the more complicated analysis for higher degree 
PTFs. 

1.1.1 PRGs for Halfspaces 

Our first step in constructing PRGs for halfspaces is to use our "monotone trick" to show that PRGs 
for polynomial width read-once branching programs (ROBPs) also fool halfspaces. Previously, 
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PRGs for polynomial width ROBPs were only known to fool halfspaces with polynomially bounded 
weights. Although the natural simulation of halfspaces by ROBP may require polynomially large 
width, we note that the resulting ROBP is what we call monotone (see Definition 2.3). We show 
that PRGs for polynomial width ROBP fool monotone ROBPs of arbitrary width. 

Theorem 1.5. A PRG that 5-fools monotone ROBP of width log(4T/e) and length T fools mono- 
tone ROBP of arbitrary width and length T with error at most e + 5. 

See Theorem 2.4 for a more formal statement. As a corollary we get the following. 

Corollary 1.6. For all e > 0, a PRG that 5-fools width log(4n/e) ROBPs fools halfspaces with 
error at most e + 5. 

The above result already improves on the previous constructions for small e, giving a PRG 
with seed length 0(log 2 n) for e = l/poly(n). However, the randomness used is 0(log 2 n) even for 
constant e. 

We next improve the dependence of the seed length on the error parameter e to obtain our main 
results for fooling halfspaces. Following the approach of Diakonikolas et al. we first construct PRGs 
fooling regular halfspaces. A halfspace with coefficients (u>i, . . . ,w n ) is regular if no coefficient is 
significantly larger than the others. Such halfspaces are easier to analyze because for regular w, the 
distribution of {w,x) with x uniformly distributed in {1, — l} n is close to a normal distribution by 
the Central Limit Theorem. Using a quantitative form of the above statement, the Berry-Esseen 
theorem, we show that a simplified version of the hitting set construction of Rabani and Shpilka 
gives a PRG fooling regular halfspaces. 

Having fooled regular halfspaces, we use the structural results on halfspaces of Servedio [Ser06] 
and Diakonikolas et al. [DGJ + 09] to fool arbitrary halfspaces. The structural results of Servedio 
and Diakonikolas et al. roughly show that either a halfspace is regular or is close to a function 
depending only on a small number of coordinates. Given this, we proceed by a case analysis as 
in Diakonikolas et al.: if a halfspace is regular, we use the analysis for regular halfspaces; else, we 
argue that bounded independence suffices. 

The above analysis gives a PRG fooling halfspaces with seed length 0(log n log 2 (l/e) /e 2 ), match- 
ing the PRG of Diakonikolas et al. [DGJ + 09]. However, not only is our construction simpler to 
analyze (for the regular case), but we can also apply our "monotone trick" to derandomize the 
construction. Derandomizing using the PRG for ROBPs of Impagliazzo et al. [INW94] gives The- 
orem 1.3. 

For spherical caps, we give a simpler more direct construction based on our generator for regular 
halfspaces. We use an idea of Ailon and Chazelle [AC06] and the invariance of spherical caps with 
respect to unitary rotations to convert the case of arbitrary spherical caps to regular spherical caps. 
We defer the details to Section 6. 

1.1.2 PRGs for PTFs 

We next extend our PRG for halfspaces to fool higher degree polynomial threshold functions. The 
construction we use to fool PTFs is a natural extension of our under ■andomized PRG for halfspaces. 
The analysis, though similar in outline, is significantly more complicated and at a high level proceeds 
as follows. 

As was done for halfspaces we first study the case of regular PTFs. The mainstay of our analysis 
for regular halfspaces is the Berry-Esseen theorem for sums of independent random variables. By 
using the generalized Berry-Esseen type theorem, or invariance principle, for low-degree multi-linear 
polynomials, proved by Mossel et al. [MOO05], we extend our analysis for regular halfspaces to 
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regular PTFs. We remark that unlike the case for halfspaces, we cannot use the invariance principle 
of Mossel et al. directly, but instead adapt their proof technique for our generator. In particular, 
we crucially use the fact that most of the arguments of Mossel et al. work even for distributions 
with bounded independence. 

We then use structural results for PTFs of Diakonikolas et al. [DSTW10] and Harsha et 
al. [HKM09] that generalize the results of Servedio [Ser06] and Diakonikolas et al. [DGJ + 09] for 
halfspaces. Roughly speaking, these results show the following: with at least a constant probabil- 
ity, upon randomly restricting a small number of variables, the resulting restricted PTF is either 
regular or has high bias. However, we cannot yet use the above observation to do a case analysis 
as was done for halfspaces; instead, we give a more delicate argument with recursive application of 
the results on random restrictions. 

1.2 Other Applications 

Gopalan et al. [GOWZ10] showed that our generator, when suitably modified, fools arbitrary func- 
tions of d halfspaces under product distributions where each coordinate has bounded fourth mo- 
ment. To e-fool any size-s, depth-ci decision tree of halfspaces, their generator uses seed length 
0{{d\og{ds/e) + logn) • log(ds/e)). For monotone functions of k halfspaces, their seed length be- 
comes 0((klog(k/e) + logn) ■ log(k/e)). They get better bounds for larger e; for example, to 
l/poly(log n)-fool all monotone functions of (log n)j log log n halfspaces, their generator requires a 
seed of length just O(logn). 

Building on techniques from this work and a new invariance principle for polytopes, Harsha 
et al. [HKM10] obtained pseudorandom generators that e-fool certain classes of intersections of k 
halfspaces with seed length (logn) • poly(log k, 1/e). As an application of their results, Harsha et 
al. obtained the first deterministic quasi-polynomial time approximate-counting algorithms for a 
large class of integer programs. 

In other subsequent work, Gopalan et al. [GKM10] used ideas motivated by the monotone trick 
to give the first deterministic polynomial time, relative error approximate-counting algorithms for 
knapsack and related problems. 

We first present our result on fooling arbitrary width monotone ROBPs with PRGs for small- 
width ROBPs. 

2 PRGs for Monotone ROBPs 

We start with some definitions. 

Definition 2.1 (ROBP). An (S, D,T) -branching program M is a layered multi-graph with a layer 
for each < i < T and at most 2 s vertices (states) in each layer. The first layer has a single vertex 
vq and each vertex in the last layer is labeled with (rejecting) or 1 (accepting). For < i < T, 
a vertex v in layer i has at most 2 D outgoing edges each labeled with an element of {0, 1} D and 
ending at a vertex in layer i + 1. 

Note that by definition, an (S, D, T)-branching program is read-once. We also use the following 
notation. Let M be an {S, D, T)-branching program and v a vertex in layer i of M. 

1. For z = (z l , z l+l , . . . , z T ) € ({0, \} D ) T + 1 - 1 ca H ( V) z ) an accepting pair if starting from v and 
traversing the path with edges labeled z in M leads to an accepting state. 



4 



2. For z G ({0, 1} D ) T , let M(z) = 1 if (vq, z) is an accepting pair, and M(z) = otherwise. 

3. Am(v) = {z : (v, z) is accepting in M} and Pm(v) is the probability that (v, z) is an accepting 
pair for z chosen uniformly at random. 

4. For brevity, let U denote the uniform distribution over ({0, 1} D ) T . 

Definition 2.2. A function G : {0, l} r -> ({0, 1} D ) T is said to e-fool (S, D, T) -branching programs 
if, for all (S, D,T) -branching programs M, 

| Pr [M(z) = l]- Pr [M(G(y)) = l]\<e. 

z^-U y6 u {0,l} r 

Nisan [Nis92] and Impagliazzo et al. [INW94] gave PRGs that fool (5, D, T)-branching programs 
with error 5 and seed length r = 0{(S + D) log T + log(T/5) log T). For T = poly (5, D), the PRG 
of Nisan and Zuckerman [NZ96] fools (S, D, T)-branching programs with seed length r = 0(S + D). 
Here we show that the above PRGs in fact fool arbitrary width monotone branching programs as 
defined below. 

Definition 2.3 (Monotone ROBP). An (S, D,T) -branching program M is said to be monotone if 
for all < i < T , there exists an ordering {v\ -< V2 ~< ■ ■ ■ -< VLi} of the vertices in layer i such that 
for l<j<k<Li, A M ( Vj ) C A M (v k ). 

Theorem 2.4. Let < e < 1 and G : {0,1} R -> ({0, 1} D ) T be a PRG that 5-fools monotone 
(log(4T/e), D,T) -branching programs. Then G fools monotone (S, D,T) -branching programs for 
arbitrary S with error at most e + 5. 

In particular, for 5 = l/poly(T) the above theorem gives a PRG fooling monotone (S,D,T)- 
branching programs with error at most 5 + e and seed length O (log ( 1 /e) log T + D log T + log 2 T) . 
Note that the seed length does not depend on the space S. Given the above result, Theorem 1.6 
follows easily. 

Proof of Theorem 1.6. A halfspace with weight vector w G W 1 and threshold 9 £M can be naturally 
computed by an (S, 1, n)-branching program M w ^, for S large enough, by letting the states in layer 
i correspond to the partial sums Y^j=i w j x j- It is eas y to check that M w ^ is monotone. The 
theorem now follows from Theorem 2.4. □ 

We now prove Theorem 2.4. The proof is based on the simple idea of "sandwiching" monotone 
branching programs between small-width branching programs. To this end, let M be a mono- 
tone (S, D, T)-branching program and call a pair of (s, D, T)-branching programs (Md own , M up ), 
e-sandwiching for M if the following hold. 

1. For all z G ({0,1} D ) T , M down (z) < M{z) < M up (z). 

2. Pr z ^ u [M up (z) = 1] - Pr z ^ u [M down {z) = 1] < e. 

We first show that existence of small-width sandwiching branching programs suffices and then 
show the existence of small-width sandwiching branching programs for monotone branching pro- 
grams. Theorem 2.4 follows directly from the following two lemmas. 

Lemma 2.5. // a PRG G 5-fools (s, D, T)-branching programs, and there exist (s, D, T)-branching 
programs (M down , M up ) that are e-sandwiching for M, then G (e + 5)-fools M. 
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Proof. Let T> denote the output distribution of G. Then, 



Pr \M down (z) = 1}< Pr [M(z) = 1], Pr [M(z) = 1] < Pr[M up (z) = 1]. 

z<-T> zl-V 



Further, since T> 5- fools M % 



up • 



Pr [M up (z) = 1] < Pr [M up (z) = 1] + 5. 



Thus, 

Pr [M(z) = 1] - Pr [M(z) = 1] < Pr [M up (*) = 1] - Pr[M down (z) = l]+5 < e + S. 

Z-^-V Z<-U Z*r4J. Z-k-U 

By a similar argument with the roles of M up , M down interchanged, we get 

| Pr \M{z) = 1] - Pr \M{z) =l]\<e + 5. 

Z^-V Z*r~U 



□ 



Lemma 2.6. For any monotone (S, D,T) -branching program M, there exist (log(2T/e), D, T)- 
branching programs (M down , M up ) that are (2e) -sandwiching for M . 

Proof. We first set up some notation. For < i < T, let the vertices in layer i in M be V 1 = {v\ -< 
v\ -< ... ~< vj.}. Let B° = {vo} and for each 1 < i < T, partition the vertices of layer i into at 
most ti < 2T/e intervals J\ = {v\ = uj l9 «j 1+1 , ■ ■ ■ ,«| 2 _i},^ = {<, «| 2+ i, • • • ,4 3 -i}, •• • , 4-1 = 
K._ i; <._ 1+ n ■ ■ ■ = so that for 1 < k < U 

P M (vi k+1 )-P M (v l ik )<e/(2T) or i k+1 = i k + 1. (2.1) 

Let jB* = {1 = ii, Z2, • • • , iti = k} be the set of separating indices for the intervals J], J|, . . . , Jl—x- 
Observe that, by definition, for any two nodes v, v' G Jl in the same interval, 

\Pm(v)-P M (v')\<^. (2.2) 

Let s = log(2T/e) and define (s, D, T)-branching programs M up , M down as follows. The vertices 
in layer i of M up , M down are for j G B l and the edges are placed by rounding the edges of M 
upwards and downwards respectively as follows. For j G B % suppose there is an edge labeled z 
between iA and a vertex v^ +1 G . If | J l k +1 \ = 1, we place an edge with label z between and 
v\ +1 in both M up and 

own- Otherwise, we place an edge with label z from Vj to in -M^p and 
an edge with label z from to in M. dcmn . We will show that M up , M down are e-sandwiching 
for M. 

Claim 2.7. For < i < T and j G B % , AM down (v l j) C Ajvf(i>j) £ Ajvf UJ) (up. /n particular, for any 
z, M down {z) < M(z) < M up {z). 

Proof. Follows from the monotonicity of M. □ 

Claim 2.8. For < i < T, and j G B i , PM up (vf) - PM down ( v j) < ( T ~ Ot- /n particular, for z 
chosen uniformly at random, Pr[M up (z) = 1] — Pr[M down {z) = 1] < 2e. 
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Proof. The second part of the claim follows from the first. We will show the first part by showing 
the following: for < i < T and j G B l , 

\PM dow M - Pm(v^)\ < (2.3) 



Pm up (v))-Pm(v*)\< 



2T 



We prove the first equation above; the second equation can be proved similarly. The proof is by 
downward induction on i. For i = T, the statement is true trivially. Now, suppose the claim is 
true for all j > i + 1. Let v = Vj £ V 1 for j G B l and let z = (z l ,z) be uniformly chosen from 
({0, iy s ) T + 1 ~ l w ith z % G M {0, 1} S . Let w = T(v, z l ) be the vertex reached by taking the edge labeled 
z % from v in M and let w £ Jl + . Then, the edge labeled z % from w goes to v l ^~ l in Mdown- Now, 
by Equation (2.2), |P M H - -Pm(4 +1 )| < e/2T. Therefore, for j G JB', 

P A/(^) = ^ = u)P M {T{v l j ,u)) 

ite{o,i} a 

< £ Pr[z* = u] + ^) 

«e{o,i} a 

< ^ Pr[g« = u] (PM doWK K +1 ) + (r 1)6 + ^ j (Induction hypothesis) 

= Y ?¥ = u]PM dow M +1 ) + ^ }l 

u£{0,l} s 

= PM down (v) ) + {T ~ T i)e (Definition of M down ). 

Since by Claim 2.7, PRi dow „('Vj) < P^ivj), Equation (2.3) now follows from the above equation and 
induction. □ 

Lemma 2.6 now follows from Claims 2.7, 2.8. □ 



3 Main Generator Construction 



We now describe our main construction G that serves as a blueprint for all of our constructions. The 
generator G is essentially a simplification of the hitting set construction of halfspaces for Rabani 
and Shpilka [RS09]. We use the following building blocks. 

1. A family "H = {h : [n] — > [t]} of hash functions that is a-pairwise independent. That is, for a 
fixed k G [t] and i ^ j G [n] , 

1 + a 



Pr \h(i) = k A h(j) = k]< 



t 2 



(3.1) 



Efficient constructions of size \H\ = 0(nt) are known for any constant a, even a = 0. 

2. A generator Go : {0, l} r ° — > {1, — l} m of a 5-almost fc-wise independent space over {1, — 1} TO . 
We say a distribution T> over {1, — l} m is 5-almost A;- wise independent if, for all . . . , ik} Q 
[ml 



6i, 



E 

,fefc6{l ,-l} fc 



Pr _[x fl 

X<r- D 



&*i -i 



< (5. 
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Efficient generators Go as above with seed length tq = 0(k + logm + log(l/<5)) are known 
[NN93]. 

Although efficient constructions of hash families % and generators Go as above are known even 
for a = 0, 5 = and constant k, we work with small but non-zero a, 5, as we will need the more 
general objects for our analysis. 

The basic idea behind the generator is as follows. We first use the hash functions to distribute 
the coordinates ([n]) into buckets. The purpose of this step is to spread out the "influences" of 
the coordinates across buckets. Then, for each bucket we use an independently chosen sample 
from a <5-almost fc-wise independent distribution to generate the bits for the coordinate positions 
mapped to the bucket. The purpose of this step is, roughly, to "match the first few moments" 
of functions restricted to the coordinates in each bucket. The hope then is to subsequently use 
invariance principles to show closeness in distribution. 

Fix the error parameter e > and let t = poly(log(l/e))/e 2 to be chosen later. Let m = n/t 
(assuming without loss of generality that t divides n) and let H be an a-pairwise independent 
hash family. To avoid some technicalities that can be overcome easily, we assume that every hash 
function h G % is evenly distributed, meaning VTi, i £ [t], \{j : h(j) = i,j € [n]}\ = n/t. Let 
Go : {0, l} r ° {1, — l} m generate a <5-almost A;- wise independent space for 5 > poly(e, 1/n) to be 
chosen later. 

Define G : U x ({0, l} r °)* {0, l} n by 

G(h, z , . . . , 0*) = x, where x\ h -i^ = Gq{z % ) for i £ [t]. (3.2) 

We will show that for the parameters t, a, 5, k and H, Go chosen appropriately, the above gener- 
ator fools halfspaces as well as degree d PTFs. In particular, we fool progressively stronger classes, 
from halfspaces to degree d PTFs by choosing % and Go progressively stronger. The table below 
gives a simplified summary of the results we get for different choices oI%,Gq. We define balanced 
hash functions in Definition 4.9. 



Hash Family T~L 


Generator Go 


Fooling class 


Pairwise independent 


4-wise independent 


Regular halfspaces, Theorem 4.3 


Pairwise independent and Balanced 


B(logt)-wise independent 


Halfspaces, Theorem 4.11 


Pairwise independent 


4d-wise independent 


Regular degree d PTFs, Theorem 5.2 


Pairwise independent and Balanced 


8(£)-wise independent 


Degree d PTFs, Theorem 5.17. 



4 PRGs for Halfspaces 

In this section we show that for appropriately chosen parameters, G fools halfspaces. We first 
show that G fools "regular" halfspaces to obtain a PRG with seed length 0(logn/e 2 ) for regular 
halfspaces. We then extend the analysis to arbitrary halfspaces to get a PRG with seed length 
0(logralog 2 (l/e)/e 2 ) and apply the monotone trick to prove Theorem 1.3. 

In the following let H Wt g : {1,-1}™ — > {1,-1} denote a halfspace H w ^(x) = s\gn((w,x) — 9). 
Unless stated otherwise, we assume throughout that a halfspace H w g is normalized, meaning \\w\\ = 
1 (here || • || is the ^-norm). We measure distance between real-valued distributions P, Q by 

d(P,Q) = ||CDF(P) - CDF(Q)|| 0O = sup | Pr [x < t] - Pr[z<t]|, 

teM. x <- p X *^Q 

also known as Kolmogorov-Smirnov distance. In particular, we say two real-valued distributions 
P, Q are e-close if d(P, Q) < e. We use the fact that Kolmogorov-Smirnov distance is convex. 
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Lemma 4.1. For fixed Q, the distance function d(P,Q) defined for probability distributions over 
M is a convex function. 

For a > 0, let J\f(0, a) denote the normal distribution with mean and variance a 1 . We also 
assume that e > 1/n' 49 as otherwise, Theorem 1.3 follows from Corollary 1.6. 



4.1 PRGs for Regular Halfspaces 

As was done in Diakonikolas et al. we first deal with regular halfspaces. 

Definition 4.2. A vector w S W 1 with \\w\\ = 1 is e-regular if \wi\ < e for all i. A halfspace H w g 
is e-regular if w is e-regular. 

Let t = 1/e 2 . We claim that for H pairwise independent and Go generating an almost 4-wise 
independent distribution, G fools regular halfspaces. Note that the randomness used by G in this 
setting is 0(logre/e 2 ). 

Theorem 4.3. Let T~L be an a-almost pairwise independent family for a = O(l) and let Gq generate 
a 5-almost 4-wise independent distribution for 5 = e 2 /4ri 5 . Then, G defined by Equation 3.2 
fools e-regular halfspaces with error at most 0(e) and seed length 0(logn/e 2 ). In particular, for 
x € {1, — l} n generated from G and e-regular w with \\w\\ = 1, the distribution of{w,x) is O(e)-close 
to A/"(0, 1). 

To prove the theorem we will need the Berry-Esseen theorem, which gives a quantitative form 
of the central limit theorem and can be seen as an invariance principle for halfspaces. 

Theorem 4.4 (Theorem 1, XVI. 5, [Fel71], [She07]). Let Yy, . . . , Yf be independent random variables 
with E[Yi] = 0, Y.i E \ Y i] = °" 2 . Hi E \\ Y i?\ < P- Let F (-) denote the cdf of the random variable 
S n = (Y"i + . . .Y n ) / a , and $(.) denote the cdf of the normal distribution Af(0, 1). Then, 

\\F - $||oo = sup \F(z) - &(z)\ < 4- 

2 0~ 

Corollary 4.5. Let Y±,...,Yt be independent random variables with E\Yj\ = 0, ^2iE[Y^] = a 2 , 
E[\Yi\ 4 ] < p4. Let F(.) denote the cdf of the random variable S n = (Yi + . . .Y n )/o~, and $(.) 
denote the cdf of the normal distribution A/"(0, 1). Then, for an absolute constant C 



\F - = sup \F{z) - *(z)\ < ^£±. 



Proof. For 1 < i < n, by Cauchy-Schwarz, E[\Yi\ 3 } < J E[Y?\ ■ J E[Yf]. Therefore 



i i \ i / \ % 



1/2 / v 1/2 

E\Y A ] 



The claim now follows from Theorem 4.4. □ 

Lemma 4.6. For e-regular w with \\w\\ = 1 and x E u {1, —1}™, the distribution of (w,x) is e-close 
to J\f(0, 1). 

Proof. Let Y { = Wi Xi. Then, Ei E K 2 ] = 1 and Ei E K 4 ] = Hi w t < ^ ■ Th e lemma now follows 
from Corollary 4.5. □ 
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The following lemma says that for a pairwise-independent family of hash functions H and 
w £ M n , the weight of the coefficients is almost equidistributed among the buckets. 



Lemma 4.7. Let % be an a-almost pairwise independent family of hash functions from [n] to [t]. 

lw ||4]<(l + a )e2 + i£ 



For e-regular w with \\w\\ = 1, Yl\=i ^Ill^/i- 1 ^) l| 4 ] ^ (l + a)e 2 + ^^. 



Proof. Fix i £ [£]. For 1 < j < n, let be the indicator variable that is 1 if h(j) = i and 
otherwise. Then, E[||w; ft -i(^ || 2 ] = 1/t and 



2 

n 



IK-Hi)ii 4 = E( X ^') 2 =E^ 4 + E^M^- 

\i=i / i=i rfk 

Now, ELY 4 ] < (l + a)A and for j / fe, E[XjX%] < {l + a)/t 2 . Thus, taking expectations of the 
above equation, 

■mm ii4i 1 "I - Q! — *N a 1 ~\~ Ct ^ — \ 9 9 

E [IK-i(i)H \ ^ — Z^^i + — 2-3™* 

1 + a , 2x 1 + a 
< — : — max Wj )H -5— 

t i t z 

„ (1 + a) e 2 1 + a 
- t + _ ^' 

The lemma follows by summing over all i £ [t]. □ 
Proof of Theorem (4-3). Fix a hash function h EH. Let u;* = for z £ [t\. Then, 

f 



(w,G(h,z))=J2(™\G (z 1 )}. 



i=l 



Let random variables Y/ 1 = Yi = (w l , Go(z t )) and Y h = Y\ + . . . + Y t . Then, Efi^] = and since 
Gq{z 1 ) is (5-almost 4-wise independent, |E[1^ 2 ] — ||w*|| 2 | < 5n 2 . Further, for l<i<t, 



m 

E_ i}m [(u;\x) 4 ] = ^(^) 4 + 3 K) 2 «) 2 <3|K 

7=1 p^ogfml 



Since, the above equation depends only on the first four moments of random variable x and 
G (^) is 5-almost 4-wise independent, it follows that E[Y- 4 ] < 3|K|| 4 + <5n 4 . Thus, Ei E K 2 ] > 
l-5n 2 t > 1/2 and £- =1 E[K 4 ] < 3 E*=i IKI| 4 + <^i 5 - Let p h = £\ ||u/|| 4 . Then, by Corollary 4.5, 
since <5 < e 2 /4n 5 , for a fixed the distribution of Y h is {\^3ph + e)-close to Af(0, 1). 

Observe that for random h, z the distribution of Y = {w,G(h, z)) is a convex-combination of 
the distributions of Y h for /i £ T~L. Thus, from Lemma 4.1, the distribution of Y is 0(E[y / /5^] + e)- 
close to A/"(0, 1). Now, by Cauchy-Schwarz Efy^] < y'EjjOfJ. Further, since w is e-regular and 
t = 1/e 2 , it follows from Lemma 4.7 that E[p h ] = Ei E [lki| 4 ] = Ei E[||w; fc -i (i) || 4 ] < 2(1 + a)e 2 . 
Thus, the distribution of Y is 0(e)-close to AA(0, 1). The theorem now follows from combining this 
with Lemma 4.6. □ 



10 



4.2 PRGs for Arbitrary Halfspaces 

We now study arbitrary halfspaces and show that the generator G fools arbitrary halfspaces if 
the family of hash functions H and generator Go satisfy certain stronger properties. We use 
the following structural result on halfspaces that follows from the results of Servedio [Ser06] and 
Diakonikolas et al. [DGJ+09]. 

Theorem 4.8. Let H w g be a halfspace with wi > . . . > w n , ^2wf = 1. There exists K = K(e) = 
0(log 2 (l/e)/e 2 ) such that one of the following two conditions holds. 

1. w K = (wkU)+Ii ■ ■ ■ > w n) ^ ^-regular. 

2. Let w' = (w±, . . . ,w K ( e }) and let H w i^(x) = sgn(^^ 1 wixi — 9). Then, 

| Pr [H wfi {x) £ H^(x)]| < 2e, (4.1) 

where T> is any distribution satisfying the following conditions for x ^— T>. 

(a) The distribution of (x\, . . . ,xk) is e-close to uniform. 

(b) With probability at least 1—e over the choice of(x\, . . . , xk), the distribution of (x^+i, • • • , x n ) 
conditioned on (xi, . . . ,xk) is (1/n 2 )- almost pairwise independent. 

In particular, for distributions T> as above 

I E [H Wt g(x)] - E [H w , t g(x)} | < 2e. (4.2) 

x<—V x<—V 

Servedio and Diakonikolas et al. show the above result when T> is the uniform distribution. 
However, their arguments extend straightforwardly to any distribution T> as above. 

Given the above theorem, we use a case analysis to analyze G. If the first condition of the 
theorem above holds, we use the results of the previous section, Theorem 4.3, showing that G fools 
regular halfspaces. If the second condition holds, we argue that for x distributed as the output of 
the generator, the distribution of (x±, . . . ,Xjf( e )) is 0(e)-close to uniform. 

Let t = K{e). We need the family of hash functions % : [n] — > [t] in the construction of G to be 
balanced along with being a-pairwise independent as in Equation (3.1). Intuitively, a hash family 
is balanced if with high probability the maximum size of a bucket is small. 

Definition 4.9 (Balanced Hash Functions). A family of hash functions H = {h : [n] — > [t] is 
(K, L, j3) -balanced if for any S C [n], \S\ < K, 

Pr[^{\h-\j)nS\)>L)<p. (4.3) 

We use the following construction of balanced hash families due to Lovett et al. [LRTV09] . 

Theorem 4.10 (See Lemma 2.12 in [LRTV09]). Let t = log(l/e)/e 2 and K = K(e) as in 
Theorem 4-8. Then, there exists a {K, 0(log(l/e)), 1 / 1 2 ) -balanced hash family 7~L : [n] — > [t] that is 
also pairwise independent with \1-L\ = exp(0(log n + log 2 (l/e))). Moreover, T~L is efficiently sam- 
plable. 

Let m = n/t and fix L to be one of 0(log t), 0(log n). We also need the generator Go : {0, l} r ° — > 
{1, — l} m to be exactly 4-wise independent and 5-almost (L + 4)-wise independent for 5 = e 3 /tn 5 . 
Generators Go as above with ro = 0(logn + log(l/<5) + L) = 0(log(n/e)) are known [NN93]. 

We now show that with T~L,Go as above, G fools halfspaces with error 0(e). The randomness 
used by the generator is log \T-L\ + r^t = 0(log?7,log 2 (l/e)/e 2 ) and matches the randomness used in 
the results of Diakonikolas et al. [DGJ + 09]. 
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Theorem 4.11. With H,Go chosen as above, G defined by Equation (3.2) fools halfspaces with 
error at most 0(e) and seed length 0(lognlog 2 (l/e)/e 2 ). 

Proof. Let H W) g be a halfspace and without loss of generality suppose that w\ > ... > w n and 
£i wf = 1. Let S = {1, ... , if (e)}. Call a hash function S-good if for all j e [t], |S,-| = (SH/i" 1 ^)! < 
L. From Definition 4.9, a random hash function /i £ u H is S-good with probability at least 1 — 1/t 2 

of the output of G and let x •<— T>. 



Recall that G(h, z , . . . , z t ) = x, where x\ h -i/j\ = Gq(z^) for j £ [t]. Let V denote the distribution 



Claim 4.12. Given an S-good hash function h, the distribution of x\g is e-close to uniform. More- 
over, with probability at least 1 — e over the random choices of x\g, the distribution of x in the 
coordinates not in S conditioned on x\g is (e 2 /4?i 5 )- almost 4-wise independent. 

Proof. Fix an S-good hash function h. Since z , . . . ,z t are chosen independently, given the hash 
function h, x\g x , ■ ■ ■ , x\s t are independent of each other. Moreover, since the output of Go is 5- 
almost (L + 4)-wise independent and \Sj\ < L for all j £ [t], x\sj is <5-close to uniform for all j € [t]. 
It follows that given an S-good hash function h, x\s is (tJ)-close to uniform. Further, by a similar 
argument, for any set I C [n] \ S with |/| = 4, the distribution of x^sui) is (t<5)-close to uniform. It 
follows that, with probability at least 1 — e, the distribution of x\j conditioned on x\g is (t5/e)-close 
to uniform. The claim now follows from the above observations and noting that tS = e 3 /4n 5 . □ 

We can now prove the theorem by a case analysis. Suppose that the weight vector w satisfies 
condition (2) of Theorem 4.8. Observe that from the above claim, D satisfies the conditions of 
Theorem 4.8 (2). Let H U)|gj e(x) = sgn((u;|5, x\g) — 9). Then, from Equation (4.2), 

| E [H wM (x)] - E [H w e (x)] \ < 2e, 

X<-U n X<r-U n 1 

I M [H w> g(x)]- E [H w g(x)] | < 2e. 

Moreover, since the distribution of x\g is e-close to uniform under T> and H w . S7 g(x) only depends 
on X\g, 

I E [H w e(x)} - E [H w g(x)}\ < e. 

X4, — U n X-i — D 

Combining the above three equations, we get that 



| E [H Wj g{x)] - E [H w> g(x))\ < 5e, 

X-tr-Un X<-V 

and thus G fools halfspace H w g with error at most 5e. 

Now suppose that condition (1) of Theorem 4.8 holds and wg = (w K ^ +1 , . . . ,w n ) is e-regular. 
Fix an assignment to the variables x\g = U\g and let xg = (xfc+i, . . . ,x n ) and H u (xk+i, ■ ■ ■ , x n ) = 
sgn((wg,xg) — 9 U ), where 6 U = 9 — (w\g,x\g). We will argue that with probability at least 1 — e, 
conditioned on the values of x\g, the output of G fools the e-regular halfspace H u with error 
0(e). Given the last statement it follows that T> fools the halfspace H w g with error O(e) since the 
distribution of x\g under T> is e-close to uniform. 

Since T-L is a family of pairwise independent hash functions and a random hash function h £ u % 
is S-good with probability at least 1 — 1/t 2 , even when conditioned on being S-good, a random 
hash function h G„ % is a-pairwise independent for a = 1. Further, from Claim 4.12, conditioned 
on the hash function h being S-good, with probability at least 1 — e, even conditioned on x\g, the 
distribution of %\[ril\S is (e 2 /4n 5 )-almost 4-wise independent. Thus, we can apply Theorem 4.3 1 
showing that with probability at least 1 — e, conditioned on the values of x\g, the output of G fools 
H u with error O(e). □ 



1 Though Theorem 4.3 was stated for t = 1/e 2 , the same argument works for all t > 1/e 2 . 
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4.3 Derandomizing G 

We now derandomize the generator from the previous section and prove Theorem 1.3. The de- 
randomization is motivated by the fact that for a fixed hash function h and w G W 1 , 9 G M, 
sgn( (w,G(h, z 1 , . . . , z 4 )) — 0) can be computed by a monotone ROBP with t layers. Given this 
observation, by Theorem 2.4, we can use PRGs for small-width ROBP to generate z , . . . , z l instead 
of generating them independently as before. 

Let ro,t,m,'H,Go be set as in the context of Theorem 4.11. Let sq = log(2i/e) = 0(log(l/e)) 
and let Gbp '■ {0, l} r — > ({0, 1} S )' be a PRG fooling (so, Tq, t)-branching programs with error 5. 
Define G D :Ux {0, l} r -> {1, -l} n by 

G D (h,y) = G(h,G BP (y)). (4.4) 

The randomness used by the above generator is log \H\ + r. We claim that Go fools halfspaces with 
error at most 0{e + 8). 

Theorem 4.13. Go fools halfspaces with error 0(e + 5). 

Proof Fix a halfspace H w ^g and without loss of generality suppose that w±, . . . , w n , 8 are integers. 
Let N = Y,j Kl + |#l- Observe that for any x G {1,-1}", (w,x)-0 G {-N, -N+l, . . . , 0, . . . , N}. 
Fix a hash function h G "H. We define a (log TV, ro , t)-branching program Mh. w that for z = 
(z 1 ,... ,z*) G ({0, l} ro )* computes (w,G(h,z)). 

For i G [t], let w i = w\ h -i(iy Then, for z = (z 1 , . . . , z*) G ({0, l} r °)*, by definition of G in 
Equation 3.2, 

i=l 

Define a space-bounded machine as follows. For each < i < t, put N nodes in layer i 

with labels 1, . . . ,N. The vertices in layer i correspond to the partial sums Z\ = Ya=i( w1 i Gq(z 1 )}. 
Note that all partial sums Z{ lie in {— N, —N + 1, . . . , N}. Now, given the partial sum Z% there 
are 2 r ° possible values for Zj+i ranging in {Zi + (w l+1 , Go(z)) : z G {0, l} r °}. We add 2 r ° edges 
correspondingly. Finally, label all vertices in the final layer corresponding to values less than 9 as 
rejecting and label all other vertices as accepting states. 

It follows from the definition of Mh, w that Mh w is monotone and for z = (z , . . . , z t ) G 
({0, l} r °)*, Mh tW {z) is an accepting state if and only if sgn(J2i(w l , Gq{z 1 )) — 8) = H Wt g(G(h, z)) = 1. 
Thus, from Theorem 2.4, for a fixed h G H, 

| Pr [H wd (G(h, z)) = 1] - Pr [H w g(G(h, Gbp(u))) = l]\<S + e. 

^6 u ({0,l} r 0)i y6 u {0,l}'' 

The theorem now follows from the above equation and Theorem 4.11. □ 

By choosing the hash family T~L from Theorem 4.10 and using the PRG of Impagliazzo et al. we 
get our main result for fooling halfspaces. 

Proof of Theorem 1.3. Choose Gbp in the above theorem to be the PRG of Impagliazzo et al. [INW94]. 
To e-fool (S, D, T)-ROBPs, the generator of Impagliazzo et al. has a seed-length of 0(D + (S + 
log(l/e)) logT). Thus, the seed-length of Gbp is r = O(ro + (sq + log(l/e)) log t) = 0(logn + 
log 2 (l/e)). The theorem follows by choosing the hash family T~L as in Theorem 4.10. □ 



13 



5 PRGs for Polynomial Threshold Functions 



We now extend our results from previous section to construct PRGs for degree d PTFs. We set the 
parameters of G as in Theorem 4.11, with the main difference being that we take Go to generate a 
/c-wise independent space for k = 0(log 2 (l/e)/e°W + Ad) instead of 0(log 2 (l/e)/e 2 ) as was done 
for fooling halfspaces. The analysis of the construction is, however, more complicated and proceeds 
as follows. 

1. We first use the invariance principle of Mossel et al. [MOO05] to deal with regular PTFs. 

2. We then use the structural results on random restrictions of PTFs of Diakonikolas et al. [DSTW10] 
and Harsha et al. [HKM09] to reduce the case of fooling arbitrary PTFs to that of fooling 
regular PTFs and functions depending only on a few variables. 

We carry out the first step above by an extension of the hybrid argument of Mossel et al. where 
we replace blocks of variables instead of single variables as done by Mossel et al. For this part of the 
analysis, we also need the anti-concentration results of Carbery and Wright [CW01] for low-degree 
polynomials over Gaussian distributions. 

The second step relies on properties of random restrictions of PTFs similar in spirit to those 
in Theorem 4.8 for halfspaces. Roughly speaking, we use the following results. There exists a set 
S C [n] of at most L = \/e^ d ' variables such that for a random restriction of these variables, with 
probability at least ( 1) one of the following happens. 

1. The resulting PTF on the variables in [n]/S is e-regular. 

2. The resulting PTF on the variables in [n]/S has high bias. 

We then finish the analysis by recursively applying the above claim to show that a generator 
fooling regular PTFs and having bounded independence also fools arbitrary PTFs. 

5.1 PRGs for Regular PTFs 

Here we extend our result for fooling regular halfspaces, Theorem 4.3, to regular PTFs. 

Definition 5.1. Let P(u\, . . . ,u n ) = o^iYliei u i ^ e a multi-linear polynomial of degree d. We 
will assume throughout that P is normalized with \\PW2 = ^2i a i = 1- -^ e ^ the influence of i'th 
coordinate Tj(P) = X^/gj a i- We say P is e-regular if 

i 

We say a polynomial threshold function f(x) = sgn(P(x) — 6) is e-regular if P is e-regular. 

Fix d > 0. Let t = l/e 2 ,m = n/t and let H be an a-pairwise independent family as in Theo- 
rem 4.3. We assume Go : {0, l} r ° — > {1, — l} m generates a 4d-wise independent space, generalizing 
the assumption of 4-wise independence used for fooling regular halfspaces. 

Theorem 5.2. Let % be an a-pairwise independent family for a = 0(1) and let Gq generate a 
Ad-wise independent distribution. Then, G defined by Equation (3.2) fools e-regular PTFs of degree 
at most d with error at most 0(d? 9 d e 2 /( 4d+1 )). 

We first prove some useful lemmas. The first lemma is simple. 
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Lemma 5.3. For a multi-linear polynomial P of degree d with \\P\\ = 1, ^2jTj(P) < d. 

The following lemma generalizes Lemma 4.7 and says that for pairwise independent hash func- 
tions and regular polynomials, the total influence is almost equidistributed among the buckets. 

Lemma 5.4. Let % = {h : [n] — > [t]} be a a-pairwise independent family of hash functions. Let P 
be a multi-linear polynomial of degree d with coefficients (ctj)jc[ n ] an d \\P\\ <• 1- For h G T~L let 



^(M) = E a 
jnh- 1 ^)^ 



Then, for h G u H 



E 
h 



Er(fc,t) s 



i=l 



(5.1) 



1 = 1 



Proof. Fix i G [t] and for 1 < j < n, let Xj be the indicator variable that is 1 if h{j) = i and 
otherwise. For brevity, let Tj = Tj(P) for j G [n]. Now, 

r(h, i)= Yl a J = Y a J ( V J6^) 



J6J 



E^E 

E* ; 



« 2 y 



Thus, 



r(M 2 < (E^ T i 



E X M + E x ;*w>- 

3 i^fc 



Note that E[X,-] < (1 + a) A and for j / jfe, E[X,-X fc ] < (1 + a)/t 2 . Thus, 



E[t(/i, z) 2 ] < — ^— E T i + E 



t 2 



3 



The lemma follows by using Lemma 5.3 and summing over all i G [t]. 



□ 



We also use (2, 4)-hypercontractivity for degree d polynomials, the anti-concentration bounds for 
polynomials over log-concave distributions due to Carbery and Wright [CW01], and the invariance 
principle of Mossel et al [MOO05]. We state the relevant results below. 

Lemma 5.5 ((2, 4)-hypercontractivity). IfQ,R are degree d multilinear polynomials, then for 

Xe u {i,-i} n , 

E [Q 2 ■ R 2 ] } < 9 d -E[Q 2 } -E[i? 2 ]. 



x 



X 



Ln particular, E[Q 4 ] < 9 d ■ E[Q 



2l2 
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The following is a special case of Theorem 8 of Carbery- Wright [CW01] (in their notation, set 
q = 2d and the distribution fi to be A/"(0, l) n ). 

Theorem 5.6 (Carbery- Wright). There exists an absolute constant C such that for any multi-linear 
polynomial P of degree at most d with \\P\\ = 1 and any interval I C K of length a > 0, 

Pr [P(X) el] < Cda 1/d . 

We use the following structural result of Mossel et al. [MOO05] that reduces the problem of 
fooling threshold functions to that of fooling certain nice functions which are easier to analyze. 

Definition 5.7. A function tp : K — )■ M is B-nice, if ip is smooth and \ip (t)\ < B for all 

Lemma 5.8 (Mossel et al.). Let X,Y be two real-valued random variables such that the following 
hold. 

1. For any interval I CR of length at most a, P<c[X G I] < Ca l l d for a universal constant C . 

2. For all 1-nice functions tp, \E[ip(X)} - E[ip(Y)\\ < e 2 . 

Then, for all t > 0, | Pr[X > t] - Pr[Y > t] | < 2Ce 2 '^ d+l \ 

The following theorem is a restatement of the main result of Mossel et al. who obtain the bound 
0(d9 d maxjTj(P))) instead of the one below. However, their arguments extend straightforwardly 
to the following. 

Theorem 5.9 (Mossel et al.). Let P be a multi-linear polynomial of degree at most d with \\P\\ = 1, 
X ^— AA(0, l) n and Y £ u {1, — l} n . Then, for any 1-nice function if), 

| E[V»(P(X))] - E[V(P(F))] | < ^ J>(P) 2 . 

i 

We first prove Theorem 5.2, assuming the following lemma which says that the generator G 
fools nice functions of regular polynomials. 

Lemma 5.10. Let P be an e-regular multi-linear polynomial of degree at most d with \\P\\ = 1. Let 
Y G u {1, — l} n and Z be distributed as the output of G. Then, for any 1-nice function ip, 

| E[^(P(Y))] - EmP(Z))] \<±±^d 2 9 d e 2 

Proof of Theorem 5.2. Let P be an e-regular polynomial of degree at most d and let X <— A/"(0, l) n . 
Let X, Y, Z be real-valued random variables defined by X = P(X), Y = P(Y) and Z = P(Z). 
Then, by Theorem 5.9 and Lemma 5.10, for any 1-nice function ip, 

\E[il,(X)] - E[^(Y)}\ < ^e 2 , \E[ij(Y)} - E[^(Z)]\ < . 

Hence, 

\E[iP(X)] -E[i{>(Z)]\ =0{d 2 9 d e 2 ). 

Further, by Theorem 5.6, for any interval / C R of length at most a, Pr[X £ I] = 0(da 1 ^ d ). 
Therefore, we can apply, Lemma 5.8 to X, Y and X, Z to get 

| Pr[X > t] - Pr[Y > t}\ = 0(d 3 9 d e 2 /( 4d+1 )), | Pr[X > t] - Pr[Z > t]\ = 0{d 3 9 d e 2 /( 4d+1 )). 

Thus, 

| Pr[Y > t] - Px[Z >t}\ = 0(d 3 9 d e 2/{M+1) ). 

□ 
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Proof of Lemma 5.10. Fix a hash function h E Ti. Let Z\, . . . , Z-t be t independent samples gener- 
ated from the 4d-wise independent space. Let Y±, . . . ,Yf be t independent samples chosen uniformly 
from {1, — l} m . We will prove the claim via a hybrid argument where we replace the blocks Y\, . . . , Yf 
with Z\ , . . . , Zt progressively. 

For < i < t, let X 1 be the distribution with X^ h _±^ = Zj for 1 < j < i and XK^q* = Yj for 

i < j < t. Then, for a fixed hash function h, X° is uniformly distributed over {1, —1}" and X 1 is 
distributed as the output of the generator. For i £ [t], let r(h, i) be the influence of the i'th bucket 
under h, 

r(h,i)= a J- 
Jnh- l (i)^<ti 

Claim 5.11. For 1 < i < t, 

|E[^(P(^))]-E[^(P(X i - 1 ))]| < y 2 T(h^f. 

Proof. Let I = h~ l (i) be the variables that have been changed from X to X 1 . Without loss of 
generality suppose that I = {1, . . . , m). Let 

P(ui, . . . ,u n ) = R(u m+ i, . . . ,u n ) + ^ aj ] 

J:Jn[m]^0 \jeJ 

where R( ) is a multi-linear polynomial of degree at most d. Let S(ui, . . . , u m , u m+ i, . . . , u n ) denote 
the degree d multi-linear polynomial given by the second term in the above expression. 

Observe that X i ~ 1 ,X i agree on coordinates not in [m]. Let X % = (Z\, . . . , Z m , X m+ i, . . . , X n ) = 
(Z,X) and X*- 1 = (Y l ,.. .,Y m ,X m+1 ,. ..,X n ) = (Y,X). Then, 

P{X l ) = R(X) + S(Z, X), PiX*- 1 ) = R(X) + S(Y, X). 

Now, by using the Taylor series expansion for ip at R(X), 

E[^(P(X i ))] - E[^(P(X^ 1 ))] = E[i>(R + S(Z, X))] - E[^(R + S(Y, X))] 

= E[^(R) + ^'(R)S(Z,X) + ±M S (Z,X) 2 + ±M S (Z,Xf + {< ± S (Z,X) 4 }]- 

R[4>(R) + il>'(R)S(Y,X) + tS^S^Xf + t-^±S(Y,Xf + {< ±S(Y,X) 4 }] 

Observe that X, Y, Z are independent of one another and are 4<i-wise independent individually. 
Since S( ) has degree at most d, it follows that for a fixed assignment of the variables X m+ i, . . . , X n 
in X, 

E[S(Z,X)] = E[S(Y,X)], E[S(Z,X) 2 } = E[S(Y,X) 2 }, 
E[S(Z, X) 3 } = E[S(Y, Xf], E[S(Z, X) 4 } = E[S(Y, X) 4 }. 
Combining the above equations we get 

| E[^(P(X 4 ))] - E[ip(P(X i ~ 1 ))]\ < 1 E[ S(Y, X) 4 ). (5.2) 
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Now, using the fact that S( ) is a multi-linear polynomial of degree at most d and since (Y, X) is 
4d-wise independent, E[5'(y,X) 4 ] = E[S'(W / ) 4 ], where W is uniformly distributed over {1,— l} n . 
Also note that 



E[S(W) 2 } =E 



e n*o 

J:JC\[m}^® \J'GJ 



E » 

r(h,i). 



Therefore, using the (2, 4)-hypercontractivity inequality, Lemma 5.5, E[S(W) 4 ] < 9 d K[S(W) 2 ] 2 
and Equation (5.2), 



EMPpf*))] - E^CPCJC*" 1 ))]! < ^E[S(Y,X) 4 ] = 1e[5(^) 4 ] 



9'' 
12 



Proof of Lemma 5.10 Continued. From Claim 5.11, for a fixed hash function h we have 



□ 
□ 



*2 * 



W(P(y))] -E[V>(P(Z))]| < ^|E^(P(^))] - E^^X- 1 ))]! < - J>(M) 



1=1 



1=1 



Therefore, for /i £ u H, using Lemma 5.1 and t = 1/e 2 



E[^(P(y))]-E[^(P(Z))]|<-E 



^ (1+a)(1+ , v <(l±^!, 

12 — 6 



□ 



5.2 Random Restrictions of PTFs 

We use the following results on random restrictions of Diakonikolas et al. [DSTW10] and Harsha et 
al. [HKM09]. We mainly use the exact statements from the work of Harsha et al., as the notion of 
regular polynomials from Diakonikolas et al. is slightly different from ours. Specifically, Diakonikolas 
et al. define regularity of a polynomial P by bounding maxj (Tj(P)), but in our analysis we use the 
bound of 'YliTi{P) 2 . Diakonikolas et al. have a statement similar to Lemma 5.16 below; however, 
we give a simple argument starting from the main lemmas of Harsha et al. for completeness. 

Fix a polynomial P of degree at most d and suppose that T\(P) > 72 (P) . . . > r n (P). Let 
K(P, e) = K be the least index i such that for all j > i, 

r^Ke^niP). 

l>i 
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Lemma 5.12 (Lemma 5.1 in Harsha et al. [HKM09]). The polynomial P x k (Yk+i, ■ ■ ■ , Y n ) = 
P(xi, . . . , XK-, Yk+Ii • • • 7 Y n ) in variables Yk+i, . . ■ ,Y n obtained by choosing x\, . . . , xk G u {1, —1} 
is Cde-regular with probability at least 7^, for some universal constants Cd,^d > 0. 

Lemma 5.13 (Lemma 5.2 in Harsha et al. [HKM09]). There exist universal constants c,Cd,S d > 
such that for K(P,e) > clog(l/e)/e 2 = L, the following holds for all 9 G R. For a random partial 
assignment (x\, . . . , xl) G u {1, — 1} L with probability at least 5d the following happens. There exists 
b G {1, —1} such that 

Pr [s\gn(P(x 1 ,x 2 ,...,x L ,Y L+ x,...,Y n ) -9) / b] < c d e, (5.3) 

for any Id-wise independent distribution D over {1, — l} n ~ L . 

The above lemma is proved by Harsha et al. when D is the uniform distribution over {1, — l} n ~ L . 
However, their argument extends straightforwardly to 2d-wise independent distributions D. 

By repeatedly applying the above lemmas, we show that arbitrary low-degree PTFs can be 
approximated by small depth decision trees in which the leaf nodes either compute a regular PTF 
or a function with high bias. We first introduce some notation to this end. 

Definition 5.14. A block decision tree T with block-size L is a decision tree with the following 
properties. Each internal node of the decision tree reads at most L variables. For each leaf node 
p G T, the output upon reaching the leaf node p is a function f p : {1, —l} Vp — > {1, —1}, where V p 
is the set of variables not occurring on the path to the node p. The depth ofT is the length of the 
longest path from the root of T to a leaf in T . 

Definition 5.15. Given a block decision tree T computing a function f , we say that a leaf node 
p G T is (e, d)-good if the function f p satisfies one of the following two properties. 

1. There exists b G {1, —1}, such that for any 2d-wise independent distribution D over {1, — l} Vp , 

2. f p is a e-regular degree d PTF. 

We now show a lemma on writing low-degree PTFs as a "decision tree of regular PTFs" . 

Lemma 5.16. There exist universal constants c' d ,c'^ such that the following holds for any degree 
d polynomial P and PTF f = sign(P() — 6). There exists a block decision tree T computing f of 
block-size L = c' d log(l/e)/e 2 and depth at most c^log(l/e), such that with probability at least 1 — e 
a uniformly random walk on the tree leads to an (e,d)-good leaf node. 

Proof. The proof is by recursively applying Lemmas 5.12 and 5.13. Let c,Cd,~/d,o~d be constants 
from the above lemmas. Let L be defined as in Lemma 5.13 and let a = min(7^, 5d). For S C 
[n] and a partial assignment y G {1, — 1} S , let P y : {1, —l}^ n ^ s — > R be the degree at most d 
polynomial defined by P y (Y) = P(Z), where Z{ = yi for i G S and Z{ = Yi for i ^ S. Let 
L(y) = mm(K(Py,e),L) and let I(y) be the L(y) largest influence coordinates in the polynomial 
Py. We now define a block-decision tree computing / inductively. 

Let yo = an d let Iq = I(yo). The root of the decision tree reads the variables in Iq. For 
< m < log 1 _ a (l/e) suppose that after m steps we are at a node (3 having read the variables in 
S(f3) C [n] and a corresponding partial assignment y. Then, if P y is Qe-regular or if P y satisfies 
Equation (5.3) we stop. Else, we make another step and read the values of variables in I(y). 
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For any leaf node p, let y(p) denote the partial assignment that leads to p. Then the leaf node 
p outputs the function f p (Y) = s\gn(P y ^(Y) — 6). 

It follows from the construction that T is a block-decision tree computing / with block-size L 
and depth at most log 1 _ Q (l/e). Further, for any internal node /3 G T, by Lemmas 5.12, 5.13 at 
least a fraction of its children are (c^e, (i)-good. Since any leaf node that is not (c^e, d)-good is at 
least log 1 _ Q (l/e) far away from the root of T, it follows that a uniformly random walk on T leads 
to a (c^e, d)-good node with probability at least 1 — e. The lemma now follows. □ 

5.3 PRGs for Arbitrary PTFs 

We now study the case of arbitrary degree d PTFs. As was done for halfspaces, we will show 
that the generator G of Equation (3.2) fools arbitrary PTFs if the family of hash functions T~L and 
generator Go satisfy stronger properties. 

Let t = Cdc' d log 2 (l/e)/e 2 , m = n/t, where c<j, c' d are the constants from Lemma 5.16. We use a 
family of hash functions T~L : [n] — > [t] that are a-pairwise independent for a = 0(1). We choose the 
generator Go : {0, l} r ° — > {1, — l} m to generate a (i + 4ci)-wise independent space. Generators Go 
with r*o = O(ilogn) are known. We claim that with the above setting of parameter the generator 
G fools all degree d PTFs. 

Theorem 5.17. With H,Go chosen as above, G defined by Equation (3.2) fools degree d PTFs 
with error at most 0(e 2 /( 4d+1 )) and seed length 0,-(logralog 4 (l/e)/e 4 ). 

The bound on the seed length of the generator follows directly from the parameter settings. By 
carefully tracing the constants involved in our calculations and those in the results of Harsha et al. 
we need, the exact seed length can be shown to be a rf logralog 4 (l/e)/e 4 for a universal constant a. 

Fix a polynomial P of degree d and a PTF f(x) = sign(P(x) — 0) and let T denote the block- 
decision tree computing / as given by Lemma 5.16. Let P>ptf denote the output distribution of 
the generator G with parameters set as above. The intuition behind the proof of the theorem is as 
follows. 

1. As T>ptf has sufficient bounded independence, the distribution on the leaf nodes of T obtained 
by taking a walk on T according to inputs chosen from T>ptf is the same as the case when 
inputs are chosen uniformly. In particular, a random walk on T according to P>ptf leads to 
a (e, d)-good leaf node with high probability. 

2. As G fools regular PTFs by Theorem 5.2, T>ptf will fool the function f p computed at a 
(e, ci)-good leaf node. We also need to address the subtle issue that we really need T>ptf to 
fool a regular PTF f„ even when conditioned on reaching a particular leaf node p. 

We first set up some notation. For a leaf node p G T, let U p = [n] \ V p be the set of variables 
seen on the path to p and let a p be the corresponding assignment of variables in U p that lead to p. 
Further, given an assignment x, let Leaf(x) denote the leaf node reached by taking a walk according 
to x on T. 

Lemma 5.18. For any leaf node p ofT, 

Pr [Leaf (a;) = p] = Pr [Leaf(x) = p]. 
x±-Vp TF x-e u {i,-i} n 
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Proof. Observe that T>ptf is a t-wise independent distribution and that for any p, \U p \ < Cd,c' d log 2 (l/e)/e 2 
t. Thus, 

Pr [Leaf (a) = p] = Pr [ X \ v = a p ] = _ 

X-^-TJp-j-F X-^-Dppp Z' P' 

Pr , [^|i/ P = a p]= Pr . [Leaf (a:) = p]. 

□ 

Lemma 5.19. Fix an (e,d)-good leaf node p ofT. Then, 

| Pr [fp{x\ v ) = 1 1 xp = o p ] - Pr [/,(„) = 1]| = 0( e 2 A 4d+1 )). 

Proof. We consider two cases depending on which of the two conditions of Definition 5.15 f p satisfies. 

Case (1) - f p has high bias. Note that T>ptf is a (t + 4d)-wise independent distribution. Since 
\Up\ < t, it follows that for x <— Vptf-, even conditioned on xnj = a p , the distribution is 2d- wise 
independent. The lemma then follows from the fact that for some b £ {1,-1}, f p evaluates to b 
with high probability. 

Case (2) - f p is an e-regular degree d PTF. We deal with this case by using Theorem 5.2. Let 
x = G(h, z\ . . . , z l ) for h G u H, z , ■ ■ ■ , z l £ u {0, l} r °, soif- T>ptf as in the definition of G. Let 
hp ■ V p — )• [t] be the restriction of a hash function h to indices in V p . For brevity, let x(p) = x\y and 
let E p be the event xm = a p . We show that the distribution of x(p), conditioned on E p , satisfies 
the conditions of Theorem 5.2. 

Observe that conditioning on E p does not change the distribution of the hash function h £ u rl 
because \U P \ < t and T>pff is i-wise independent. Thus, even when conditioned on E p , the hash 
functions h p are almost pairwise independent. For a hash function h, i £ [t], let B p (h,i) = \ 
V p = h~ l {i). Now, since Go generates a (t + 4d)-wise independent distribution, even conditioned 
on E p , for a fixed hash function h, the random variables x(p)\ B r h u, x(p)\ Bp ( h2 ), • • • , x{p)\ B ^ ht \ 
are independent of one another. Moreover, each x{p)\ B (h,i) is 4d-wise independent for i G [t]. 

Thus, even conditioned on E p , the distribution of x(p) satisfies the conditions of Theorem 5.2 
and hence fools the regular degree d PTF f p with error at most 0(e 2/(4d+1) ). The lemma now 
follows. □ 

Proof of Theorem 5.17. Observe that 



Pr [f(x) = 1] = ?J 1X N", = a P ] ■ Pr [f p (y) = 1]. 

Similarly, 



Pr [/(*) = 1] = J] Pr = ap] ■ Pr [f p {x\ Vp ) = l\x\ Up 

X*^T>ptf X*^T>ptf X-t^-UpTF 

pGLeaves(T) 

From the above equations and Lemma 5.18 it follows that 
| Pr [f(x) = l]- Pr [f(x) = l}\< 

a;«-{l,— l} n x-^-T>ptf 



CI, 



p£Leai>es(T) 



Pr [f p (x lVp ) = l\x\ Up =a p ]- Pr [/ p (y) = 1] 
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Now, by Lemma 5.19 for any (e, (i)-good leaf p the corresponding term on the right hand side of 
the above equation is 0(e 2 /( 4d+1 )). Further, from Lemma 5.16 we know that a random walk ends 
at a good leaf with probability at least 1 — e. It follows that 

| Pr [/(*) = 1] - Pr [/(*) = 1]| < et = 0(e 2 /^). 

X+-{l,-l} n X^Vp TF 

□ 

Our main theorem on fooling degree d PTFs, Theorem 1.2, follows immediately from the above 
theorem. 



6 PRGs for Spherical Caps 

We now show how to extend the generator for fooling regular halfspaces and its analysis from 
Section 4.1 to get a PRG for spherical caps and prove Theorem 1.4. 

Let p be a discrete distribution (if not, let's suppose we can discretize p) over a set [/CI. 
Also, suppose that for X <- p, E[X] = 0,E[X 2 } = 1,E[|A"| 3 ] = O(l). Given such a distribution 
p, a natural approach for extending G to p is to replace the fe-wise independent space generator 
Go : {0, l} r — > {1, — l} m from Equation (3.2) with a generator G^ : {0, l} r — > U m that generates a 
A:- wise independent space over U m . It follows from the analysis of Section 4.1 that for G^ chosen 
with appropriate parameters, the above generator fools regular halfspaces over p n . It then remains 
to fool non-regular halfspaces over p n . It is reasonable to expect that an analysis similar to that 
in Section 4.2 can be applied to p n , provided we have analogues of the results of Servedio and 
Diakonikolas et al., Theorem 4.8, for p n . 

The above ideas can be used to get a PRG for spherical caps by noting that a) the uniform dis- 
tribution over the sphere is close to a product of Gaussians (when the test functions are halfspaces) 
and b) analogues of Theorem 4.8 for product of Gaussians follow from known anti-concentration 
properties of the univariate Gaussian distribution. Building on the above argument, Gopalan et 
al. [GOWZ10] recently obtained PRGs fooling halfspaces over "reasonable" product distributions. 
Here we take a different approach and give a simpler, more direct construction for spherical caps 
based on an idea of Ailon and Chazelle [AC06] and the invariance of spherical caps with respect to 
unitary rotations. 

Let S n -\ = {x G M n : ||a;||2 = 1} denote the n-dimensional sphere. By a spherical cap S Wt e we 
mean the section of S n -i cut by a halfspace, i.e., S Wi $ = f {x : x G H w g(x) = 1}. 

Definition 6.1. A function G : {0, l} r — > 5 n _i is said to e-fool spherical caps if, for all spherical 
caps S Wj e, 

| Pr [xeS w , e }- Pr [G(y) € S w ,e]\ < e. 

Note that the uniform distribution over 5 n _i, U sp , is not a product distribution. We first show 
that U sp is close to M(0, \ j \J~n) n when the test functions are halfspaces. 

Lemma 6.2. There exists a universal constant C such that for any halfspace H w ^, 
| Pr \H wfi {x) = 1] - Pr [H wfi {x) = 1]| < ^f^. 

In particular, for x U sp , the distribution of {w, x) is 0( x /Iogn/n 1 / 4 )-dose to Af(0,l/y/n). 
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Proof. Observe that for x <— A/"(0, l/^/n) n , x/\\x\\2 is distributed uniformly over <S n _i. Thus, 

x 



Pr [H w>e (x) = l] 



Pr [H u 

x<-Af(0,l/y/n) n 



F 2 



Now, for any x G 



(w, x) 



(w, x) 



X 2 



\{w,x)\ 



x 2 



1 3? 1 1 2 - 1| 



Since for x ^— Af(0, 1/y/n), (w,x) is distributed as M(0, l/-^/n), for some constant ci, 



Pr 

-Af(0,l/^) r 



C 1 ^/\og 



)1 



n 



1/2 



< 



n 



Further, by applying Chernoff bound to \\x\\2, it follows that for some constant C2 > 0, 



Pr 

-Af(0,l/VH) n 



MI2 - II > 



C 2V / log 



)1 



n 



1/4 



< 



n 



Combining the above equations we get 



Pr 

x^Af(0,l/^n) r - 



(w, x) 



(w, x) 



x \2 



> 



n 



Therefore, for C = C1C2, 



Pr 

x<^J\f (0,1 /^/n) n 



x h 



< 



< 



< 



Pr 

x^Af(0,l/^n) n 

Pr 

x^Af(0,l/Vn) n 

Clogn 



cic 2 logn 

\(w,x) 
\(w,x) 



< 



n 



< 



< 



(w, x) 
cic 2 logn 



(w, x) 



n 



3/4 



\X\\2 

2 

+ - 

n 



n 



1/4 



where the last inequality follows from the fact that {w, x) is distributed as AA(0, l/y/n) and for any 
interval ICR, Pr^^i) [x G I] = 0{\I\). □ 

Now, by Theorem 4.3, for e-regular w and x generated from G with parameters as in Theo- 
rem 4.3, the distribution of (w,x/y/n) is 0(e)-close to M(0, l/y^n). It then follows from the above 
lemma that G e-fools spherical caps S Wt $ when w is e-regular and e > C log n/n 1//4 . We now reduce 
the case of arbitrary spherical caps to regular spherical caps. 

Observe that the volume of a spherical cap S Wi q is invariant under rotations: for any unitary 
matrix A G R nxn with A T A = I n , 



Pr [x G Syjfi] 



Pr [Ax G S wfi ]. 

X-k-Uav 



We exploit this fact by using a family of rotations 1Z of Ailon and Chazelle [AC06] which 
satisfies the property that for any w G M n and a random rotation V £ u 1Z, Vw is regular with high 
probability. Let H G R nxn be the normalized Hadamard matrix such that H T H = I n and each 
entry Hij G {±l/y/n}. For a vector x G M n , let D{x) denote the diagonal matrix with diagonal 
entries given by x. Observe that for x G {1, — l} n , HD(x) is a unitary matrix. Ailon and Chazelle 
(essentially) show that for any w G W 1 and x G u {1, — l} n , HD(x)w is 0(-v/log n/ y / n)-regular. We 
derandomize their construction by showing that similar guarantees hold for x chosen from a 8-wise 
independent distribution. 
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Lemma 6.3. For all w £ W 1 ', \\w\\ = 1, and x G {1,— l} n chosen from an 8-wise independent 
distribution the following holds. For v = HD(x)w, 7 > 0, 



Pr[^^ 4 >^]=0(^). 



n 7^ 

Proof. Let random variable Z = ^ vf. Observe that each v i is a linear function of x and 



Note that since x is 8-wise independent, we can apply (2, 4)-hypercontractivity, Lemma 5.5, to v%. 
Thus, 

E[Z] = ^EK 4 ]<9^EK 2 ]2<^. 



Similarly, by (2, 4)-hypercontractivity applied to the quadratics v 2 ,v 2 , 

E[Z 2 } = ^E[vfv*] < ^9 2 E[u 4 ]E[« 4 ] < 9 2 E[Z} 2 < 

The lemma now follows from the above equation and Markov's inequality applied to Z 2 . □ 

Combining the above lemmas we get the following analogue of Theorem 4.3 for spherical caps. 
Let G be as in Theorem 4.3 and let T> be a 8-wise independent distribution over {1, —1}" . Define 
G sph :{l,-l} n x{0,l} r ^5 n _iby 

Theorem 6.4. For any spherical cap S w £ with \\w\\ = 1 and e > C log n/n 1//4 , 
I Pr[{w,z)>0]- Pr [( W ,G sph (x,y))>9]\=0(e). 

Proof. By Lemma 6.2, for z <— U sp , (w,z) is 0(e)-close to Af(0,l/y/n). Further, by applying 
Lemma 6.3 for 7 = l/\/e, we get that v = HD(x)w is 5-regular with probability at least 1 — 0(e) 
for 5 = l/(y / ne 1 / 4 ) < e. Now, by Theorem 4.3 for v e-regular and y G u {0, l} r , the distribution of 
{v, G(y)) is 0(e)-close to AA(0, 1). The theorem now follows from combining the above claims and 
noting that (v,G(y)/y/n) = (w,G sph (x,y)). □ 

Theorem 1.4 now follows from the above theorem and derandomizing G as done in Section 4.3 
for proving Theorem 1.3. 
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A Non-Explicit Bounds 

It is known ([LC67], [RSOK91]) that the number of distinct halfspaces on n bits is at most 2 n . One 
way of extending this bound to degree d PTFs is as follows. It is known that the Fourier coefficients 
of the first d+1 levels of a degree d PTF, also known as the Chow parameters, determine the PTF 
completely (see [OS08]). Thus, a PTF / is completely determined by ChowParam(/) = (E[/ • \i] '■ 
I C [n], \I\ < d), where Xl( x ) = Yliei x i denotes the parity over the coordinates in /. Observe that 
for any / C [n], E[/ • xi] £ {i/2 n : i G Z, |z| < 2 n }. Therefore, the number of distinct degree d 
PTFs is at most the number of distinct sequences ChowParam( ), which in turn is at most (2 n ) n . 
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The non-explicit bound now follows by observing that any class of boolean functions J- can be 
fooled with error at most e by a set of size at most 0(log(|J"|)/e 2 ). Thus, de gree d PTFs can be 
fooled by a sample space of size at most 0(n d+1 /e 2 ). 
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